Adaptive pointwise estimation in time-inhomogeneous 
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This paper offers a new method for estimation and forecasting of the volatility of financial time 
series when the stationarity assumption is violated. Our general local parametric approach 
particularly applies to general varying-coefficient parametric models, such as GARCH, whose 
coefficients may arbitrarily vary with time. Global parametric, smooth transition, and change- 
point models are special cases. The method is based on an adaptive pointwise selection of the 
largest interval of homogeneity with a given right-end point by a local change-point analysis. We 
construct locally adaptive estimates that can perform this task and investigate them both from 
the theoretical point of view and by Monte Carlo simulations. In the particular case of GARCH 
estimation, the proposed method is applied to stock-index series and is shown to outperform 
the standard parametric GARCH model. 
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1 Introduction 

A growing amount of econometrical and statistical research is devoted to modeling fi- 
nancial time series and their volatility, which measures dispersion at a point in time (i.e., 
conditional variance). Although many economies and financial markets have been recently 
experiencing many shorter and longer periods of instability or uncertainty such as Asian 
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crisis (1997), Russian crisis (1998), start of the European currency (1999), the "dot-Corn" 
technology-bubble crash (2000-2002), or the terrorist attacks (September, 2001), the war 
in Iraq (2003), and the current global recession (2008), mostly used econometric models 
are based on the assumption of time homogeneity. This includes linear and nonlinear 
autoregressive (AR) and moving- average models and conditional heteroscedasticity (CH) 
models such as ARCH (Engel, 1982) and GARCH (Bollerslev, 1986), stochastic volatility 
models (Taylor, 1986), as well as their combinations such as AR-GARCH. 

On the other hand, the market and institutional changes have long been assumed to 
cause structural breaks in financial time series, which was confirmed, for example, in data 
on stock prices (Andreou and Ghysels, 2002; Beltratti and Morana, 2004) and exchange 
rates (Herwatz and Reimers, 2001). Moreover, ignoring these breaks can adversely affect 
the modeling, estimation, and forecasting of volatility as suggested by Diebold and Inoue 
(2001), Mikosch and Starica (2004), Pesaran and Timmermann (2004), and Hillebrand 
(2005), for instance. Such findings led to the development of the change-point analysis in 
the context of CH models; see for example, Chen and Gupta (1997), Kokoszka and Leipus 
(2000), and Andreou and Ghysels (2006). 

An alternative approach lies in relaxing the assumption of time homogeneity and 
allowing some or all model parameters to vary over time (Chen and Tsay, 1993; Cai et 
al., 2000; Fan and Zhang, 2008). Without structural assumptions about the transition 
of model parameters over time, time-varying coefficient models have to be estimated 
nonparametrically, for example, under the identification condition that their parameters 
are smooth functions of time (Cai et al., 2000). In this paper, we follow a different strategy 
based on the assumption that a time series can be locally, that is over short periods of 
time, approximated by a parametric model. As suggested by Spokoiny (1998), such a local 
approximation can form a starting point in the search for the longest period of stability 
(homogeneity), that is, for the longest time interval in which the series is described well by 
the parametric model. In the context of the local constant approximation, this strategy 
was employed for volatility modeling by Hardle et al. (2003), Mercurio and Spokoiny 
(2004), and Spokoiny (2008). Our aim is to generalize this approach so that it can identify 
intervals of homogeneity for any parametric CH model regardless of its complexity. 

In contrast to the local constant approximation of the volatility of a process (Mercurio 
and Spokoiny, 2004), the main benefit of the proposed generalization consists in the 
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possibility to apply the methodology to a much wider class of models and to forecast over 
a longer time horizon. The reason is that approximating the mean or volatility process by 
a constant is in many cases too restrictive or even inappropriate and it is fulfilled only for 
short time intervals, which precludes its use for longer-term forecasting. On the contrary, 
parametric models like GARCH mimic the majority of stylized facts about financial time 
series and can reasonably fit the data over rather long periods of time in many practical 
situations. Allowing for time dependence of model parameters offers then much more 
flexibility in modeling real-life time series, which can be both with or without structural 
breaks since global parametric models are included as a special case. 

Moreover, the proposed adaptive local parametric modeling unifies the change-point 
and varying-coefficient models. First, since finding the longest time-homogeneous interval 
for a parametric model at any point in time corresponds to detecting the most recent 
change-point in a time series, this approach resembles the change-point modeling as in 
Bai and Perron (1998) or Mikosch and Starica (1999, 2004), for instance, but it does not 
require prior information such as the number of changes. Additionally, the traditional 
structural-change tests require that the number of observations before each break point is 
large (and can grow to infinity) as these tests rely on asymptotic results. On the contrary, 
the proposed pointwise adaptive estimation does not rely on asymptotic results and does 
not thus place any requirements on the number of observations before, between, or after 
any break point. Second, since the adaptively selected time- homogeneous interval used for 
estimation necessarily differs at each time point, the model coefficients can arbitrarily vary 
over time. In comparison to varying-coefficient models assuming smooth development of 
parameters over time (Cai et al., 2000), our approach however allows for structural breaks 
in the form of sudden jumps in parameter values. 

Although seemingly straightforward, extending Mercurio and Spokoiny (2004) 's pro- 
cedure to the local parametric modeling is a nontrivial problem, which requires new tools 
and techniques. We concentrate here on the change-point estimation of financial time 
series, which are often modelled by data-demanding models such as GARCH. While the 
benefits of a flexible change-point analysis for time series spanning several years are well 
known, its feasibility (which stands in the focus of this work) is much more difficult to 
achieve. The reason is thus that, at each time point, the procedure starts from a small 
interval, where a local parametric approximation holds, and then iteratively extends this 
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interval and tests it for time-homogeneity until a structural break is found or data ex- 
hausted. Hence, a model has to be initially estimated on very short time intervals (e.g., 
10 observations). Using standard testing methods, such a procedure might be feasible for 
simple parametric models, but it is hardly possible for more complex parametric models 
such as GARCH that generally require rather large samples for reasonably good estimates. 

Therefore, we use an alternative and more robust approach to local change-point 
analysis that relies on a finite-sample theory of testing a growing sequence of historical 
time intervals on homogeneity against a change-point alternative. The proposed adaptive 
pointwise estimation procedure applies to a wide class of time-series models, including 
AR and CH models. Concentrating on the latter, we describe in details the adaptive 
procedure, derive its basic properties, and focusing on the feasibility of adaptive estimation 
for CH models, study the performance in comparison to the parametric (G)ARCH by 
means of simulations and real-data applications. The main conclusion is two-fold: on one 
hand, the adaptive pointwise estimation is feasible and beneficial also in the case of data- 
demanding models such as GARCH; on the other hand, the adaptive estimates based on 
various parametric models such as constant, ARCH, or GARCH models are much to closer 
to each other (while being better than the usual parametric estimates), which eliminates 
to some extent the need for using too complex models in adaptive estimation. 

The rest of the paper is organized as follows. In Section [2j the parametric estimation 
of CH models and its finite-sample properties are introduced. In Section [31 we define 
the adaptive pointwise estimation procedure and discuss the choice of its parameters. 
Theoretical properties of the method are discussed in Section HI In the specific case of 
the ARCH(l) and GARCH(1,1) models, a simulation study illustrates the performance of 
the new methodology with respect to the standard parametric and change-point models 
in Section [51 Applications to real stock-index series data are presented in Section El The 
proofs are provided in the Appendix. 

2 Parametric conditional heteroscedasticity models 

Consider a time series Y t in discrete time, t & N . The conditional heteroscedasticity 
assumption means that Y t = a t e t , where {e t }teN is a white noise process and {a t }teN is 
a predictable volatility (conditional variance) process. Modelling of the volatility process 
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at typically relies on some parametric CH specification such as the ARCH (Engle, 1982) 
and GARCH (Bollerslev, 1986) models: 

a\ = uj + J2 OiYli + £ M-v (2- 1 ) 

i=l 3=1 

where p G N , g G iV , and 6 = (uj, a\, . . . , a p , (3%, . . . , j3 q ) T is the parameter vector. An 
attractive feature of this model is that, even with very few coefficients, one can model 
most stylized facts of financial time series like volatility clustering or excessive kurtosis, 
for instance. A number of (G)ARCH extensions were proposed to make the model even 
more flexible; for example, EGARCH (Nelson, 1991), Q GARCH (Sentana, 1995), and 
TGARCH (Glosten et al., 1993) that account for asymmetries in a volatility process. 

All such CH models can be put into a common class of generalized linear volatility 
models: 

Y t = a t e t = y/g{X t )e t , (2.2) 

P Q 

X t = u + ^OihiYt-d+Y^PjXt-j, (2.3) 

i=l j=l 

where g and h are known functions and X t is a (partially) unobserved process (structural 
variable) that models the volatility coefficient o\ via transformation g : erf = g(X t ) . For 
example, the GARCH model (12 .ip is described by g(u) = u and h(r) = r 2 . 

Model f l2.2p -( l2~3l) is time homogeneous in the sense that the process Y t follows the 
same structural equation at each time point. In other words, the parameter 6 and hence 
the structural dependence in Y t is constant over time. Even though models like (12. 2p - 
( 12. 3p can often fit data well over a longer period of time, the assumption of homogeneity 
is too restrictive in practical applications: to guarantee a sufficient amount of data for 
sufficiently precise estimation, these models are often applied over time spans of many 
years. On the contrary, the strategy pursued here requires only local time homogeneity, 
which means that at each time point t there is a (possibly rather short) interval [t — m, t] , 
where the process Y t is well described by model (I2.2p - (l2.3p . This strategy aims then both 
at finding an interval of homogeneity (preferably as long as possible) and at the estimation 
of the corresponding parameter values , which then enable predicting Y t and X t . 

Next, we discuss the parameter estimation for model (I2.2I) - (I2.3I) using observations 
Y t from some time interval / = [toi^i] • The conditional distribution of each observation 
Y t given the past $t-i is determined by the structural variable X t , whose dynamics is 
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described by the parameter vector : X t = X t (0) for t E I due to (12.31) . We denote the 
underlying value of by O . 

For estimating O , we apply the quasi maximum likelihood (quasi-MLE) approach 
using the estimating equations generated under the assumption of Gaussian errors e t ■ 
This guarantees efficiency under the normality of innovations and consistency under rather 
general moment conditions (Hansen and Lee, 1994; Francq and Zakoian, 2007). The log- 
likelihood for the model (I2.2p -( !2~3l) on an interval I can be represented in the form 

L I (6) = J2nYt,gix t (0)]} 

with log-likelihood function £(y,v) = — 0.5 {log(v) +y 2 /v}. We define the quasi-MLE 
estimate 0/ of the parameter by maximizing the log-likelihood Lj(6) , 

0, = argmaxL 7 (0) = argmax V £{Y t , g[X t (d))}, (2.4) 
eee eee ^ 

and denote by Lj(6j) the corresponding maximum. 

To characterize the quality of estimating the parameter vector Oq = {u, ai, . . . , ct p , 
Pi, . . . , (3 q ) T by 6 j , we now present an exact (nonasymptotic) exponential risk bound. 
This bound concerns the value of maximum Lj(6j) = max^gg) Lj(6) rather than the point 
of maximum Qj . More precisely, we consider difference Lj(6j, Oq) = Li(6j) — Li(0q) . By 
definition, this value is non-negative and represents the deviation of the maximum of the 
log-likelihood process from its value at the "true" point 6 . Later, we comment on how 
the accuracy of estimation of the parameter 6q by 0/ relates to the value Lj(6j,6q) . 
We will also see that the bound for Lj(0/, 0o) yields the confidence set for the parameter 
0o , which will be used for the proposed change-point test. Now, the nonasymptotic 
risk bound is specified in the following theorem, which formulates Corollary 4.2 and 4.3 
of Spokoiny (2009) for the case of quasi-MLE estimation of a CH model (l2.2l) -( T2~3l) at 
= 0o • The result can be viewed as an extension of the Wilks phenomenon that the 
distribution of L/(0/,0 o ) for a linear Gaussian model is Xp/2, where p is the number 
of estimated parameters in the model. 

Theorem 2.1. Assume that the process Y t follows the model / l£.£j) -/ [OP with the pa- 
rameter 6q E , where the set is compact. The function g(-) is assumed to be 
continuously differ entiable with the uniformly bounded first derivative and g(x) > 5 > 
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for all x . Further, let the process X t (0) be sub-ergodic in the sense that for any smooth 
function /(•) there exists f* such that for any time interval I 
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J2{f(M0)) ~ E J(X t (O))} 
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<f*\I\, OeO. 



Let finally E exp{x(e 2 — 1) < c(x) for some x > , c(x) > , and all t 6 N . 
Then there are A > and e(A, Oq) > such that for any interval I and 3 > 

Pe {Lj{e u e ) > 3) < exp{e(A,0 o ) - A3}. (2.5) 

Moreover, for any r > , there is a constant !tH r (0 o ) such that 

Eg^L^O^d^ <%.{0 Q ). (2.6) 

Remark 2.1. The condition g(x) > 5 > guarantees that the variance process cannot 
reach zero. In the case of GARCH, it is sufficient to assume u > , for instance. 

One attractive feature of Theorem I2.1[ formulated in the following corollary, is that 
it enables constructing the non-asymptotic confidence sets and testing the parametric 
hypothesis on the basis of the fitted log-likelihood Li(6j,6) . This feature is especially 
important for our procedure presented in Section [SJ 

Corollary 2.2. Under the assumptions of Theorem \2.1\ let the value } a fulfill c(A, Oq) — 
\% a < log a for some a < 1 . Then the random set £7(3^) = {6 : Lj(0i, 6) < 3 Q } is an 
a -confidence set for in the sense that Pg o (0 o £7(3^)) < a. 

Theorem 12.11 also gives a non-asymptotic and fixed upper bound for the risk of estima- 
tion Lj(0j, 0q) that applies to an arbitrary sample size |/| . To understand the relation 
of this result to the classical rate result, we can apply the standard arguments based on 
the quadratic expansion of the log-likelihood L(0, 0) . Let V 2 L(0) denote the Hessian 
matrix of the second derivatives of L{0) with respect to the parameter . Then 

Li (0i, O ) = 0.5(0, - o ) T V 2 L / (0 / )(0 / - O ), (2.7) 

where 0\ is a convex combination of 0q and 0i . Under usual regularity assumptions 
and for sufficiently large \I\ , the normalized matrix |/| _1 V 2 -L/(0) is close to some matrix 
V{0) , which depends only on the stationary distribution of Y t and is continuous in . 
Then (12.51) approximately means that \\^/V(0 o )(0 I - O )\\ < 3/|/| with probability 
close to 1 for large 3 . Hence, the large deviation result of Theorem 12. II yields the root- |/| 
consistency of the MLE estimate 0j . See Spokoiny (2009) for further details. 



3 Pointwise adaptive nonparametric estimation 

An obvious feature of the model (I2.2p - (l2.3p is that the parametric structure of the process 
is assumed constant over the whole sample and cannot thus incorporate changes and 
structural breaks at unknown times in the model. A natural generalization leads to models 
whose coefficients may change over time (Fan and Zhang, 2008). One can then assume 
that the structural process X t satisfies the relation (12. 3ft at any time, but the vector of 
coefficients 6 may vary with the time t , = 0(t) . The estimation of the coefficients 
as general functions of time is possible only under some additional assumptions on these 
functions. Typical assumptions are (i) varying coefficients are smooth functions of time 
(Cai et al., 2000) and (ii) varying coefficients are piecewise constant functions (Bai and 
Perron, 1998; Mikosch and Starica, 1999, 2004). 

Our local parametric approach differs from the commonly used identification assump- 
tions (i) and (ii). We assume that the observed data Y t are described by a (partially) 
unobserved process X t due to (12 .2\\ . and at each point T, there exists a historical inter- 
val I(T) = [to, T] in which the process X t "nearly" follows the parametric specification 
(12. 3p (see Section H] for details on what "nearly" means). This local structural assumption 
enables us to apply well developed parametric estimation for data {Y t }tei{T) to estimate 
the underlying parameter 6 = 6(T) by 6 = 0(T) . (The estimate = 0(T) can be 
then used for estimating the value X? of the process X t at T from equation (12. 3p and 
for further modeling such as forecasting 1t+i )• Moreover, this assumption includes the 
above mentioned "smooth transition" and "switching regime" assumptions (i) and (ii) as 
special cases: parameters 0(T) vary over time as the interval I(T) changes with T , and 
at the same time, discontinuities and jumps in 0(T) as a function of time are possible. 

To estimate 0(T) , we have to find the historical interval of homogeneity I(T) , that 
is, the longest interval / with the right-end point T , where data do not contradict a 
specified parametric model with fixed parameter values. Starting at each time T with a 
very short interval I = [to , T] , we search by successive extending and testing of interval 
/ on homogeneity against a change-point alternative: if the hypothesis of homogeneity 
is not rejected for a given / , a larger interval is taken and tested again. Contrary to 
Bai and Perron (1998) and Mikosch and Starica (1999), who detect all change points in a 
given time series, our approach is local: it focuses on the local change-point analysis near 
the point T of estimation and tries to find only one change closest to the reference point. 
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In the rest of this section, we first discuss the test statistics employed to test the time- 
homogeneity of an interval / against a change-point alternative in Section |3~T1 Later, we 
rigorously describe the pointwise adaptive estimation procedure in Section 13.21 Its imple- 
mentation and the choice of parameters entering the adaptive procedure are described in 
Sections I3.2H3.4I Theoretical properties of the method are studied in Section HI 

3.1 Test of homogeneity against a change-point alternative 

The pointwise adaptive estimation procedure crucially relies on the test of local time- 
homogeneity of an interval I = [to,T] . The null hypothesis for / means that the obser- 
vations {Y t }tei follow the parametric model fl2.2l) - fl2.3p with a fixed parameter 6q , lead- 
ing to the quasi-MLE estimate 6j from (12.41) and the corresponding fitted log-likelihood 

The change-point alternative for a given change-point location r G / can be described 
as follows: process Y t follows the parametric model fl2.2l) - fl2.3p with a parameter Oj for 
t G J = [to) r ] an d with a different parameter Ojc for t G J c = [r + 1, T] ; Oj ^ 6jc . 
The fitted log-likelihood under this alternative reads as Lj(0j) + Ljc(6jc) . The test of 
homogeneity can be performed using the likelihood ratio (LR) test statistic T/ iT : 

T ItT = max {Lj(Oj)+Lj c (Oj c )}-maxL I (O) = {Lj(dj) + Lj c (djc)-L I (0 I )}. 
6j,Ojc£0 eee 

Since the change-point location r is generally not known, we consider the supremum of 
the LR statistics Tj T over some subset r G 7(1) , cf. Andrews (1993): 

Ti,7(i) = sup T ItT . (3.1) 
rei(i) 

A typical example of a set T(J) is 7(1) = {r : t + m' < r < T — m" } for some fixed 
m', m" > . 

3.2 Adaptive search for the longest interval of homogeneity 

This section presents the proposed adaptive pointwise estimation procedure. At each point 
T, we aim at estimating the unknown parameters 0(T) from historical data Y t , t < T ; 
this procedure repeats for every current time point T as new data arrives. At the first 
step, the procedure selects on the base of historical data an interval I(T) of homogeneity 
in which the data do not contradict the parametric model fl2.2p - fl2.3p . Afterwards, the 
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quasi-MLE estimation is applied using the selected historical interval I(T) to obtain 
estimate 0(T) = Ofr T \ ■ From now on, we consider an arbitrary, but fixed time point T . 

Suppose that a growing set Jo C I\ C . . . C Ik of historical interval-candidates 
Ik — [T — rrik + 1,T] with the right-end point T is fixed. The smallest interval Iq is 
accepted automatically as homogeneous. Then the procedure successively checks every 
larger interval Ik on homogeneity using the test statistic Tj fci o"(/ fc ) from (13. ip . The selected 
interval / corresponds to the largest accepted interval 1^ with index k such that 

T Ik , nik) <u, k< % (3.2) 

and T I% v ^ I% j > , where the critical values fa are discussed later in this section 
and specified in Section 13.31 This procedure then leads to the adaptive estimate 6 = 6j 
corresponding to the selected interval 1 = 1-^. 

The complete description of the procedure includes two steps. (A) Fixing the set-up 
and the parameters of the procedure. (B) Data-driven search for the longest interval of 
homogeneity. 

(A) Set-up and parameters: 1. Select a specific parametric model f l2.2j) -( !2~3l) (e.g., 

constant volatility, ARCH(l), GARCH(1,1)). 

2. Select the set 3 = (J , • • • , Ik) of interval-candidates, and for each I k 6 J , the 
set 7(h) of possible change points r G h used in the LR test (13.11) . 

3. Select the critical values 3i, ... ,3k m (13.2ft as described in Section [3731 

(B) Adaptive search and estimation: Set k = 1 , I = I , and 9 = 6 Io . 

1. Test the hypothesis Hq^ of no change point within the interval I k using test 
statistics ( 13. ip and the critical values ik obtained in (A3). If a change point 
is detected (Uo,fc is rejected), go to (B3). Otherwise proceed with (B2). 

2. Set = Ik and h = Ik . Further, set k := k + 1 . If k < K , repeat (Bl); 
otherwise go to (B3). 

3. Define I = 7^-1 = "the last accepted interval" and 6 = 6j. Additionally, set 

e Ik = ... = e lK = e nk<K. 

In the step (A), one has to select three main ingredients of the procedure. First, 
the parametric model used locally to approximate the process Y t has to be specified in 
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(Al), for example, the constant volatility or GARCH(1,1) in our context. Next in step 
(A2), the set of intervals 3 = {Ik}^ =Q is fixed, each interval with the right-end point 
T, length m k = \I k \ , and the set 7(I k ) of tested change points. Our default proposal 
is to use a geometric grid = [m a k ],a > 1, and to set 1^ = [T — rrik + 1,T] and 
7 (Ik) = [T — nik-i + 1,T — rrik-2\ . Although our experiments show that the procedure 
is rather insensitive to the choice of m Q and a (e.g., we use m = 10 and a = 1.25 in 
simulations), the length mo of interval Iq should take into account the parametric model 
selected in (Al). The reason is that Iq is always assumed to be time-homogeneous and 
m thus has to reflect flexibility of the parametric model; for example, while m = 20 
might be reasonable for GARCH(1,1) model, m = 5 could be a reasonable choice for 
the locally constant approximation of a volatility process. Finally in step (A3), one 
has to select the K critical values ik in H 3 . 2 1) for the LR test statistics T^^n^ from 
(13. ip . The critical values 3^ will generally depend on the parametric model describing 
the null hypothesis of time- homogeneity, the set 3 of intervals Ik and corresponding 
sets of considered change points 7 (Ik) , k < K , and additionally, on two constants r 
and p that are counterparts of the usual significance level. All these determinants of 
the critical values can be selected in step (A) and the critical values are thus obtained 
before the actual estimation takes place in step (B). Due to its importance, the method 
of constructing critical values {lk\k=i is discussed separately in Section [3731 

The main step (B) performs the search for the longest time-homogeneous interval. 
Initially, I is assumed to be homogeneous. If Ik-i is negatively tested on the presence 
of a change point, one continues with by employing the test (13.11) in step (Bl), which 
checks for a potential change point in I*. . If no change point is found, then I k is accepted 
as time-homogeneous in step (B2); otherwise the procedure terminates in step (B3). We 
sequentially repeat these tests until we find a change point or exhaust all intervals. The 
latest (longest) interval accepted as time- homogeneous is used for estimation in step (B3). 
Note that the estimate 0j k defined in (B2) and (B3) corresponds to the latest accepted 
interval I k after the first k steps, or equivalently, the interval selected out of 1%, . . . , I k ■ 

Moreover, the whole search and estimation step (B) can be repeated at different time 
points T without reiterating the initial step (A) as the critical values %k depend only 
on the approximating parametric model and interval lengths m k = \I k \ , not on the time 
point T (see Section l3~3l) . 
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3.3 Choice of critical values 3^ 

The presented method of choosing the interval of homogeneity / can be viewed as multiple 
testing procedure. The critical values for this procedure are selected using the general 
approach of testing theory: to provide a prescribed performance of the procedure under the 
null hypothesis, that is, in the pure parametric situation. This means that the procedure 
is trained on the data generated from the pure parametric time homogeneous model from 
step (Al). The correct choice in this situation is the largest considered interval Ik and 
a choice I? with k < K can be interpreted as a "false alarm". We select the minimal 
critical values ensuring a small probability of such a false alarm. Our condition slightly 
differs though from the classical level condition because we focus on parameter estimation 
rather than on hypothesis testing. 

In the pure parametric case, the "ideal" estimate corresponds to the largest considered 
interval Ik ■ Due to Theorem 12.11 the quality of estimation of the parameter by 
0j K can be measured by the log-likelihood "loss" Lj k (6i k ,6o) , which is stochastically 
bounded with exponential and polynomial moments: Eg \Li K (6i K ,Oo)\ r < ^\ r (0 ) . If 
the adaptive procedure stops earlier at some intermediate step k < K , we select instead 
of 6j K another estimate = Qj k with a larger variability. The loss associated with 
such a false alarm can be measured by the value Lj k (6i k , 6) = Li k (6j k ) — Lj K (6) . The 
corresponding condition bounding the loss due to the adaptive estimation reads as 



This is in fact an implicit condition on the critical values {lk}k=i > which ensures that 
the loss associated with the false alarm is at most the p -fraction of the log-likelihood 
loss of the "ideal" or "oracle" estimate 6i K for the parametric situation. The constant 
r corresponds to the power of the loss in (13.31) . while p is similar in meaning to the test 
level. In the limit case when r tends to zero, this condition (13.31) becomes the usual level 
condition: Pe (lK is rejected) = Pe (Oi K 7^ 9) < p. The choice of the metaparameters 
r and p is discussed in Section 13.41 

A condition similar to (13.31) is imposed at each step of the adaptive procedure. The 
estimate 6i k coming after the k steps of the procedure should satisfy 




(3.3) 



E o \L Ik (o h , o Ik )\ r < p k %.(e ) 



k = l,...,K, 



(3.4) 



12 



where pk = pk / ' K < p . The following theorem presents some sufficient conditions on the 
critical values {$k}k=i ensuring (13 .4ft ; recall that = \Ik\ denotes the length of Ik- 

Theorem 3.1. Suppose that r > 0, p > 0. Under the assumptions of Theorem \2.1[ 
there are constants ao,di,a2 such that the condition \3.4\ ) is fulfilled with the choice 

3 fc = a r log(p _1 ) + a 1 r\og(m K /m k ^ l ) + a 2 log(m fc ), k = l,...,K. 

Since K and {nT>k}k=i are fixed, the 3^ 's in Theorem 13.11 have a form 3^ = C + 
Dlog(mk) for k = 1,...,K with some constant C and D . However, a practically 
relevant choice of these constants has to be done by Monte-Carlo simulations. Note first 
that every particular choice of the coefficients C and D determines the whole set of the 
critical values {$k}k=i an d thus the local change-point procedure. For the critical values 
given by fixed (C, D) , one can run the procedure and observe its performance on the 
simulated data using the data-generating process ( I2.2l) -( l2~3l) ; in particular, one can check 
whether the condition (13.41) is fulfilled. For any (sufficiently large) fixed value of C , one 
can thus find the minimal value D(C) < of D that ensures (13. 1J) . Every corresponding 
set of critical values in the form Ik = C + D{C)\og{rrik) is admissible. The condition 
D(C) < ensures that the critical values decreases with k. This reflects the fact that 
a false alarm at an early stage of the algorithm is more crucial because it leads to the 
choice of a highly variable estimate. The critical values 3& for small k should thus be 
rather conservative to provide the stability of the algorithm in the parametric situation. 
To determine C , the value 31 can be fixed by considering the false alarm at the first step 
of the procedure, which leads to estimation using the smallest interval Jo instead of the 
"ideal" largest interval Ik ■ The related condition (used in Section 15.11) reads as 

E 9o \L lK (e lK ,e Io )\ r l(T Iurih) > 31) < pK r (0 o )/K. (3.5) 

Alternatively, one could select a pair (C, D) that minimizes the resulting prediction error, 
see Section 13.41 

3.4 Selecting parameters r and p 

The choice of critical values using inequality ( 13.41) additionally depends on two "metapa- 
rameters" r and p . A simple strategy is to use conservative values for these parameters 
and the corresponding set of critical values (e.g., our default is r = 1 and p = 1). On 
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the other hand, the two parameters are global in the sense that they are independent of 
T . Hence, one can also determine them in a data-driven way by minimizing some global 
forecasting error (Cheng et al., 2003). Different values of r and p may lead to different 
sets of critical values and hence to different estimates 0^ ' (T) and to different forecasts 
Yx+h\T °f ^ ne f u t ure values Yt+h ■> where h is the forecasting horizon. Now, a data-driven 
choice of r and p can be done by minimizing the following objective function: 

(f, p) = argmin PE AH (r, p) = argmin ^ ^2 A ( Yt +^ Y t+k\t) > ( 3 - 6 ) 

r>0,p>0 r,p T heJi 

where A is a loss function and "K is the forecasting horizon set. For example, one can 
take A r (v,v') = \v — v'\ r for r e [1/2,2] . For daily data, the forecasting horizon could 
be one day, "K = {1} , or two weeks, "K = {1, . . . , 10} . 

4 Theoretic properties 

In this section, we collect basic results describing the quality of the proposed adaptive 
procedure. First, the definition of the procedure ensures the performance prescribed 
by (13. 4p in the parametric situation. We however claimed that the adaptive pointwise 
estimation applies even if the process Y t is only locally approximated by a parametric 
model. Therefore, we now define locally "nearly parametric" process, for which we derive 
an analogy of Theorem 12.11 (Section 14.11) . Later, we prove certain "oracle" properties of 
the proposed method (Section |4.2|) . 

4.1 Small modeling bias condition 

This section discusses the concept of "nearly parametric" case. To define it rigorously, we 
have to quantify the quality of approximating the true latent process X t , which drives 
the observed data Y t due to (12.21) . by the parametric process X t (6) described by (12.31) 
for some G . Below we assume that the innovations e t in the model (I2.2p are 
independent and identically distributed and denote the distribution of y/ve t by P v so 
that the conditional distribution of Y t given Jt-i is P g {x t ) ■ To measure the distance of a 
data-generating process from a parametric model, we introduce for every interval 4 G 3 
and every parameter 6 6 the random quantity 

A Ik (O) = J2M9(X t ),g[X t (0)]}, 
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where %(v,v') denotes the Kullback-Leibler distance between P v and P v > . For CH 
models with Gaussian innovations e t , %(v,v') = — 0.5{log(t;/t/) + 1 — v/v'} . In the 
parametric case with X t = X t (0 o ) , we clearly have Aj k (0 o ) =0. To characterize the 
"nearly parametric case," we introduce small modeling bias (SMB) condition, which sim- 
ply means that, for some 6 , Aj k (0) is bounded by a small constant with a high 
probability. Informally, this means that the "true" model can be well approximated on 
the interval Ik by the parametric one with the parameter . The best parametric fit (12.31) 
to the underlying model ( 12.2ft on Ik can be defined by minimizing the value EAj k (6) 
over 6 £ and 6j k can be viewed as its estimate. 

The following theorem claims that the results on the accuracy of estimation given 
in Theorem 12.11 can be extended from the parametric case to the general nonparametric 
situation under the SMB condition. Let g(0, 6) be any loss function for an estimate . 

Theorem 4.1. Let for some 6 and some A > 

EA Ik (9)<A. (4.1) 

Then it holds for an estimate constructed from the observations {Y t }tei k that 

E log(l + q(6, e)/E g g(9, 0)) < 1 + A 

This general result applied to the quasi-MLE estimation with the loss function Lj(0j, 6) 
yields the following corollary. 

Corollary 4.2. Let the SMB condition (f^.jp hold for some interval Ik and 6 . Then 

£log(l + 1^(04,0)17^(0)) < 1 + 4 

where 9^,(0) is the parametric risk bound from 112.6}) . 

This result shows that the estimation loss |L/(0/,0)| r normalized by the parametric 
risk 9^ r (0) is stochastically bounded by a constant proportional to e . If A is not 
large, this result extends the parametric risk bound (Theorem 12. ip to the nonparametric 
situation under the SMB condition. Another implication of Corollary 14.21 is that the 
confidence set built for the parametric model (Corollary 12. 2p continues to hold, with a 
slightly smaller coverage probability, under SMB. 
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4.2 The "oracle" choice and the "oracle" result 

Corollary 14.21 suggests that the "optimal" or "oracle" choice of the interval Ik from the 
set 7i, . . . ,Ik can be defined as the largest interval for which the SMB condition (14.11) 
still holds (for a given small A > ). For such an interval, one can neglect deviations of 
the underlying process from a parametric model with a fixed parameter 6 . Therefore, 
we say that the choice k* is the "oracle" choice if there exists 6 such that 

EA lM .{6)<A, (4.2) 

for a fixed A > and that (14. 2 p does not hold for k > k* . Unfortunately, the under- 
lying process X t and hence, the value Aj k is unknown and the oracle choice cannot be 
implemented. The proposed adaptive procedure tries to mimic this oracle on the basis 
of available data using the sequential test of homogeneity. The final oracle result claims 
that the adaptive estimate provides the same (in order) accuracy as the oracle one. 

By construction, the pointwise adaptive procedure described in Section [3] provides 
the prescribed performance if the underlying process follows the parametric model (12. 2|) . 
Now, condition (13.41) combined with Theorem 14.11 implies similar performance in the first 
k* steps of the adaptive estimation procedure. 

Theorem 4.3. Let e and A > be such that EA h , (0) < A for some k* < K . 
Also let maxKfc. E \L Ik (d Ik , 0)\ r < %.(0) . Then 

Similarly to the parametric case, under the SMB condition EAj „ (0) < A , any choice 
k < k* can be viewed as a false alarm. Theorem 14.31 documents that the loss induced by 
such a false alarm at the first k* steps and measured by L Ik „{0 Iklt ,0 Iklt ) is of the same 
magnitude as the loss Lj kt (0i k * , 0) of estimating the parameter 6 from the SMB (14.21) 
by 6i k „ . Thus under (14.21) . the adaptive estimation during steps k < k* does not induce 
larger errors into estimation than the quasi-MLE estimation itself. 

For further steps of the algorithm with k > k* , where (14.21) does not hold, the value 
A' = EAj k (0) can be large and the bound for the risk becomes meaningless due to the 
factor e A> . To establish the result about the quality of the final estimate, we thus have 
to show that the quality of estimation cannot be destroyed at the steps k > k* . The next 
"oracle" result states the final quality of our adaptive estimate . 
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Theorem 4.4. Let EA Ik ,(0) < A for some k* < K . Then L Jfc ,(0 v , 0)1 (Jfe > k*) < 
Ik* yielding 

E ^ + ]&&$L)« p+4+log ( 1 + _jy. 

Due to this result, the value L/ fc „ (0/ fc , , 0) is stochastically bounded. This can be 
interpreted as oracle property of because it means that the adaptive estimate belongs 
with a high probability to the confidence set of the oracle estimate 0/ fe * . 

5 Simulation study 

In the last two sections, we present simulation study (Section E]) and real data applications 
(Section EJ) documenting the performance of the proposed adaptive estimation procedure. 
To verify the practical applicability of the method in a complex setting, we concentrate on 
the volatility estimation using parametric and adaptive pointwise estimation of constant 
volatility, ARCH(l), and GARCH(1,1) models (for the sake of brevity, referred to as 
the local constant, local ARCH, and local GARCH). The reason is that the estimation 
of GARCH models requires generally hundreds of observations for reasonable quality of 
estimation, which puts the adaptive procedure working with samples as small as 10 or 
20 observations to a hard test. Additionally, the critical values obtained as described in 
Section [3T3l depend on the underlying parameter values in the case of (G)ARCH. 

Here we first study the finite-sample critical values for the test of homogeneity by 
means of Monte Carlo simulations and discuss practical implementation details (Sec- 
tion 15.11) . Later, we demonstrate the performance of the proposed adaptive pointwise 
estimation procedure in simulated samples (Sections 15.21) . Note that, throughout this sec- 
tion, we identify the GARCH(1,1) models by triplets (uj, a, (3) : for example, (1, 0.1, 0.3) - 
model. Constant volatility and ARCH(l), are then indicated by a = (3 = and (3 = 0, 
respectively. The GARCH estimation is done using GARCH 3.0 package (Laurent and 
Peters, 2006) and Ox 3.30 (Doornik, 2002). Finally, since the focus is on modelling the 
volatility of in (12.2p . the performance measurement and comparison of all models at time 
t is done by the absolute prediction error (PE) of the volatility process over a prediction 
horizon %: APE(t) = J^hew^t+h ~ ^t+Wtl/l^-lj where &% +h u represents the volatility 
prediction by a particular model. 
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5.1 Finite-sample critical values for test of homogeneity 

A practical application of the pointwise adaptive procedure requires critical values for 
the test of local homogeneity of a time series. Since they are obtained under the null 
hypothesis that a chosen parametric model (locally) describes the data, see Section [3J we 
need to obtain the critical values for the constant volatility, ARCH(l), and GARCH(1,1) 
models. Furthermore for given r and p , the average risk (13. 4p between the adaptive and 
oracle estimates can be bounded for critical values that linearly depend on the logarithm 
of interval length \I k \ : 3(|/fc|) = Ik = C + D\og{\I k \) (see Theorem 13.11) . As described 
in Section 13.31 we choose here the smallest C satisfying (13. 5p and the corresponding 
minimum admissible value D = D(C) < that guarantees the conditions (13. 4ft . 

We simulated the critical values for ARCH(l) and GARCH(1,1) models with different 
values of underlying parameters; see Table [T] for the critical values corresponding to r = 1 
and p = 1 . Their simulation was performed sequentially on intervals with lengths ranging 
from | Jo | = ma = 10 to \Ik\ = 570 observations using a geometric grid with multiplier 
a = 1.25 , see Section l3~2l (The results are however not sensitive to the choice of a .) 

Unfortunately, the critical values depend on the parameters of the underlying (G)ARCH 
model (in contrast to the constant- volatility model). They generally seem to increase with 
the values of the ARCH and GARCH parameters keeping the other one fixed, see Table [TJ 
To deal with this dependence on the underlying model parameters, we propose to choose 
the largest (most conservative) critical values corresponding to any estimated parameter 
in the analyzed data. For example, if the largest estimated parameters of GARCH(1,1) 
are a = 0.3 and (3 = 0.8 , one should use 3(10) = 26.4 and 3(570) = 14.5 , which are the 
largest critical values for models with a = 0.3,/? < 0.8 and with a < 0.3,(3 = 0.8 . (The 
proposed procedure is however not overly sensitive to this choice as we shall see later.) 

Finally, let us have a look at the influence of the tuning constants r and p in (13.41) 
on the critical values for several selected models (Table [5]). The influence is significant, 
but can be classified in the following way. Whereas increasing p generally leads to an 
overall decrease of critical values (cf. Theorem 13. ip . but primarily for the longer intervals, 
increasing r leads to an increase of critical values mainly for the shorter intervals, cf. 
(13. 4p . In simulations and real applications, we verified that a fixed choice such as r = 1 
and p = 1 performs well. To optimize the performance of the adaptive methods, one can 
however determine constants r and p in a data-dependent way as described in Section l3~3l 



18 



Table 1: Critical values %k = 3(|4|) of the supremum LR test for various constant 
( a = (3 = 0), ARCH(l) ( (3 = ), and GARCH(1,1) models; u = 1, r = 1, p = 1 , and a 
and /3 are stated in the table. 

a 

oir 

0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 



141 


0.0 


0.1 


0.2 
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0.5 


0.6 


0.7 
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0.9 


10 
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15.5 


16.4 


16.8 
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17.3 


17.0 


17.0 


16.9 


16.0 


570 


5.5 


7.2 


7.0 


7.0 


7.5 


7.5 


7.4 


7.3 


7.0 


6.7 


10 


16.3 


14.5 
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16.4 


15.9 
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16.0 
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9.8 


10.7 


11.5 


12.5 
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13.5 
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32.4 














570 


12.7 
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60.9 





















We use here this strategy for a small grid of r 6 {0.5, 1.0} and p 6 {0.5, 1.0, 1.5} and 
find globally optimal r and p . We will document though that the differences in the 
average absolute PE ( 13. 6ft for various values of r and p are relatively small. 

5.2 Simulation study 

We aim to examine how well the proposed estimation method is able to adapt to long 
stable (time-homogeneous) periods and to less stable periods with more frequent volatility 
changes, and (ii) to see which adaptively estimated model - local volatility, local ARCH, 
or local GARCH - performs best in different regimes. To this end, we simulated 100 series 
from two change-point GARCH models with a low GARCH effect (u, 0.2, 0.1) and a high 
GARCH-effect (u, 0.2, 0.7) . Changes in constant to are spread over a time span of 1000 
days, see Figure 15.11 There is a long stable period at the beginning (500 days ~ 2 years) 
and end (250 days ~ 1 year) of time series with several volatility changes between them. 
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Table 2: Critical values 3(|ijfe|) of the supremum LR test for some constant volatility, 
ARCH(l), and GARCH(1,1) models and various values r and p. 

Model (u,a,P) (0.1,0.0,0.0) (0.1,0.2,0.0) (0.1,0.1,0.8) 



P 3(10) 3(570) 3(10) 3(570) 3(10) 3(570) 
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Figure 5.1: GARCH(1,1) parameters of low (upper panel) and high (lower panel) GARCH- 
effect simulations for t = 1, . . . , 1000 . 



5.2.1 Low GARCH-effect 

Let us now discuss simulation results from the low GARCH-effect model. First, we men- 
tion the effect of structural changes in time series on the parameter estimation. Later, we 
compare the performance of all methods in terms of absolute PE. 

Estimating a parametric model from data containing a change point will necessarily 
lead to various biases in estimation. For example, Hillebrand (2005) demonstrates that a 
change in volatility level uo within a sample drives the GARCH parameter (3 very close to 
1. This is confirmed when we analyze the parameter estimates for parametric and adaptive 
GARCH at each time point t 6 [250, 1000] as depicted on Figure 15.21 The parametric 
estimates are consistent before breaks starting at t — 500 , but the GARCH parameter 
f3 becomes inconsistent and converges to 1 once data contain breaks, t > 500 . The 
locally adaptive estimates are similar to parametric ones before the breaks and become 
rather imprecise after the first change point, but they are not too far from the true value 
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Figure 5.2: The mean (solid line) and 10% and 90% quantiles (dotted lines) of the param- 
eters estimated by the parametric (upper row) and locally adaptive (lower row) GARCH 
methods, t = 250, . . . , 1000. Thick dotted line represents the true parameter value. 



on average and stay consistent (in the sense that the confidence interval covers the true 
values). The low precision of estimation can be attributed to rather short intervals used 
for estimation (cf . Figure 15.21 for t < 500 ) . 

Next, we would like to compare the performance of parametric and adaptive estimation 
methods by means of absolute PE: first for the prediction horizon of one day, % = {1} , 
and later for prediction two weeks ahead, "K = {1, ... , 10} . To make the results easier to 
decipher, we present in what follows PEs averaged over the past month (21 days). The 
absolute-PE criterion was also used to determine the optimal values of parameters r and 
p (jointly across all simulations and for all t = 250, 1000 ). The results differ for 
different models: r = 0.5, p = 0.5 for local constant, r = 0.5, p = 1.0 for local ARCH, 
and r = 0.5, p = 1.5 for local GARCH. 

Let us now compare the adaptively estimated local constant, local ARCH, and local 
GARCH models with the parametric GARCH, which is the best performing parametric 
model in this setup. Forecasting one period ahead, the average PEs for all methods and 
the median lengths of the selected time-homogeneous intervals for adaptive methods are 
presented on Figure 15.31 First of all, one can notice that all methods are sensitive to 
jumps in volatility, especially to the first one at t = 500 : the parametric ones because 
they ignore a structural break, the adaptive ones because they use a small amount of data 
after a structural change. In general, the local GARCH performs rather similarly to the 
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Figure 5.3: Left panel: Low GARCH-effect simulations: absolute prediction errors one 
period ahead averaged over last month for the parametric GARCH and adaptive local 
constant, local ARCH, and local GARCH models; t G [250, 1000] . Right panel: The 
median lengths of adaptively selected intervals for all three pointwise adaptive methods. 



parametric GARCH for t < 650 because it uses all historical data. After initial volatility 
jumps, the local GARCH however outperforms the parametric one, 650 < t < 775 . 
Following the last jump at t = 750 , where the volatility level returns closer to the initial 
one, the parametric GARCH is best of all methods for some time, 775 < t < 850 , until 
the adaptive estimation procedure detects the (last) break, and after it, "collects" enough 
observations for estimation. Then the local GARCH and local ARCH become preferable 
to the parametric model again, 850 < t . Interestingly, the local ARCH approximation 
performs almost as well as both GARCH methods and even outperforms them shortly after 
structural breaks (except for break at t = 750), 600 < t < 775 and 850 < t < 1000. 
Finally, the local constant volatility is lacking behind the other two adaptive methods 
whenever there is a longer time period without a structural break, but keeps up with them 
in periods with frequent volatility changes, 500 < t < 650 . All these observations can be 
documented also by the absolute PE averaged over the whole period 250 <t< 1000 (we 
refer to it as the global PE from now on): the smallest PE is achieved by local ARCH 
(0.075), then by local GARCH (0.079), and the worst result is from local constant (0.094). 

Additionally, all models are compared using the forecasting horizon of ten days. Most 
of the results are the same (e.g., parameter estimates) or similar (e.g., absolute PE) to 
forecasting one period ahead due to the fact that all models rely on at most one past 
observation. The absolute PEs averaged over one month are summarized on Figure I5.4[ 
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Figure 5.4: Left panel: Low GARCH-effect simulations - absolute prediction errors ten 
periods ahead averaged over last month. Right panel: High GARCH-effect simulations - 
absolute prediction errors one period ahead averaged over last month. In both cases, the 
parametric GARCH, adaptive local constant, local ARCH, and local GARCH models are 
presented for t 6 [250, 1000] . 

which reveals that the difference between local constant volatility, local ARCH, and local 
GARCH models are smaller in this case. As a result, it is interesting to note that: (i) 
the local constant model becomes a viable alternative to the other methods (it has in fact 
the smallest global PE 0.107 from all adaptive methods); and (ii) the local ARCH model 
still outperforms the local GARCH (global PEs are 0.108 and 0.116, respectively) even 
though the underlying model is GARCH (with a small value of (3 — 0.1 however). 

5.2.2 High GARCH-effect 

Let us now discuss the high GARCH-effect model. One would expect much more prevalent 
behavior of both GARCH models, since the underlying GARCH parameter is higher and 
the changes in the volatility level uj are likely to be small compared to overall volatility 
fluctuations. Note that the optimal values of tuning constant r and p differ from the 
low GARCH-effect simulations: r = 0.5, p = 1.5 for local constant; r = 0.5, p = 1.5 for 
local ARCH; and r = 1.0, p = 0.5 for local GARCH. 

Comparing the absolute PEs for one-period-ahead forecast at each time point (Fig- 
ure 15.41) indicates that the adaptive and parametric GARCH estimations perform approx- 
imately equally well. On the other hand, both the parametric and adaptively estimated 
ARCH and constant volatility models are lacking significantly. Unreported results con- 
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Figure 5.5: Top panel: The log-returns of DAX series from January 1990 till December 
2002. Bottom panels: The ratios of the absolute prediction errors of the three pointwise 
adaptive methods to the parametric GARCH for predictions one period ahead averaged 
over one month. The DAX index is considered from January 1992 to March 1997 (left 
panel) and from July 1999 to June 2001 (right panel). 



firm, similarly to the low GARCH-effect simulations, that the differences among method 
are much smaller once a longer prediction horizon of ten days is used. 



6 Applications 

The proposed adaptive pointwise estimation method will be now applied to real time series 
consisting of the log-returns of the DAX and S&P 500 stock indices (Sections 16. II and 16. 21) . 
We will again summarize the results concerning both parametric and adaptive methods by 
the absolute PEs one-day ahead averaged over one month. As a benchmark, we employ the 
parametric GARCH estimated using last two years of data (500 observations). Since we 
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however do not have the underlying volatility process now, it is approximated by squared 
returns. Despite being noisy, this approximation is unbiased and provides usually the 
correct ranking of methods (Andersen and Bollerslev, 1998). 

6.1 DAX analysis 

Let us now analyze the log-returns of the German stock index DAX from January 1990 
till December 2002 depicted at the top of Figure 15.51 Several periods interesting for 
comparing the performance of parametric and adaptive pointwise estimates are selected 
since results for the whole period might be hard to decipher at once. 

First, consider the estimation results for years 1991 to 1996. Contrary to later periods, 
there are structural breaks practically immediately detected by all adaptive methods (July 
1991 and June 1992; cf. Stapf and Werner, 2003). For the local GARCH, this differs 
from less pronounced structural changes discussed later, which are typically detected only 
with several months delays. One additional break detected by all methods occurs in 
October 1994. Note that parameters r and p were r = 0.5, p = 1.5 for local constant, 
r = 1.0, p = 1.0 for local ARCH, and r = 0.5, p = 1.5 for local GARCH. 

The results for this period are summarized in Figure 15.51 which depicts the PEs of 
each adaptive method relative to the PEs of parametric GARCH. First, one can notice 
that the local constant and local ARCH approximations are preferable till July 1991, 
where we have less than 500 observations. After the detection of the structural change in 
June 1991, all adaptive methods are shortly worse than the parametric GARCH due to 
limited amount of data used, but then outperform the parametric GARCH till the next 
structural break in the second half of 1992. A similar behavior can be observed after 
the break detected in October 1994, where the local constant and local ARCH models 
actually outperform both the parametric and adaptive GARCH. In the other parts of 
the data, the performance of all methods is approximately the same, and even though 
the adaptive GARCH is overall better than the parametric one, the most interesting fact 
is that the adaptively estimated local constant and local ARCH models perform equally 
well. In terms of the global PE, the local constant is best (0.829), followed by the local 
ARCH (0.844) and local GARCH (0.869). This closely corresponds to our findings in 
simulation study with low GARCH effect in Section 15.21 Note that for other choices of 
r and p, the global PEs are at most 0.835 and 0.851 for the local constant and local 
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ARCH, respectively. This indicates low sensitivity to the choice of these parameters. 

Next, we discuss the estimation results for years 1999 to 2001 ( r = 1.0 for all methods 
now). After the financial markets were hit by the Asian crisis in 1997 and Russian crisis in 
1998, market headed to a more stable state in year 1999. The adaptive methods detected 
the structural breaks in the fall of 1997 and 1998. The local GARCH detected them 
however with more than one- year delay - only during 1999. The results in Figure 15.51 
confirm that the benefits of the adaptive GARCH are practically negligible compared to 
the parametric GARCH in such a case. On the other hand, the local constant and ARCH 
methods perform slightly better than both GARCH methods during the first presented 
year (July 1999 to June 2000). From July 2000, the situation becomes just the opposite 
and the performance of the GARCH models is better (parametric and adaptive GARCH 
estimates are practically the same in this period since the last detected structural change 
occurred approximately two years ago). Together with previous results, this opens the 
question of model selection among adaptive procedures as different parametric approx- 
imations might be preferred in different time periods. Judging by the global PE, the 
local ARCH provides slightly better predictions on average than the local constant and 
local GARCH - despite the "peak" of the PE ratio in the second half of year 2000 (see 
Figure l5~5l) . This however depends on the specific choice of loss A in (13.61) . 

Finally, let us mention that the relatively similar behavior of the local constant and 
local ARCH methods is probably due to the use of ARCH(l) model, which is not suf- 
ficient to capture more complex time developments. Hence, ARCH(p) might be a more 
appropriate interim step between the local constant and GARCH models. 

6.2 S&P 500 

Now we turn our attention to more recent data regarding the S&P 500 stock index con- 
sidered from January 1990 to December 2004, see Figure 16.11 This period is marked by 
many substantial events affecting the financial markets, ranging from September 11, 2001, 
terrorist attacks and the war in Iraq (2003) to the crash of the technology stock-market 
bubble (2000-2002). For the sake of simplicity, a particular time period is again selected: 
year 2003 representing a more volatile period (war in Iraq) and year 2004 being a less 
volatile period. All adaptive methods detected rather quickly a structural break at the 
beginning of 2003, and additionally, they detected a structural break in the second half 
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Figure 6.1: Left panel: The log-returns of S&P 500 from January 2000 till December 2004. 
Right panel: The ratio of the absolute prediction errors of the three pointwise adaptive 
methods to the parametric GARCH for predictions one period ahead averaged over one 
month horizon. The S&P 500 index is considered from January, 2003 to December, 2004. 



of 2003, although the adaptive GARCH did so with a delay of more than 8 months. The 
ratios of monthly PE of all adaptive methods to those of the parametric GARCH are 
summarized on Figure IBTTJ ( r = 0.5 and p = 1.5 for all methods). 

In the beginning of year 2003, corresponding with 2002 to a more volatile period 
(see Figure EUD, all adaptive methods perform as well as the parametric GARCH. In the 
middle of year 2003, the local constant and local ARCH models are able to detect another 
structural change (possibly less pronounced than the one at the beginning of 2003 because 
of its late detection by the adaptive GARCH). Around this period, the local ARCH shortly 
performs worse than the parametric GARCH. From the end of 2003 and in year 2004, all 
adaptive methods starts to outperform the parametric GARCH, where the reduction of the 
PEs due to the adaptive estimation amounts to 20% on average. All adaptive pointwise 
estimates exhibit a short period of instability in the first months of 2004, where their 
performance temporarily worsens to the level of parametric GARCH. This corresponds to 
"uncertainty" of the adaptive methods about the length of the interval of homogeneity. 
After this short period, the performance of all adaptive methods is comparable, although 
the local constant performs overall best of all methods (closely followed by local ARCH) 
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judged by the global PE. 

Similarly to the low GARCH-effect simulations and to the analysis of DAX in Sec- 
tion 16. 1[ it seems that the benefit of pointwise adaptive estimation is most pronounced 
during periods of stability that follow an unstable period (i.e., year 2004) rather than 
during a presumably rapidly changing environment. The reason is that, despite possible 
inconsistency of parametric methods under change points, the adaptive methods tend to 
have rather large variance when the intervals of time homogeneity become very short. 

7 Conclusion 

We extend the idea of adaptive pointwise estimation to parametric CH models. In the spe- 
cific case of ARCH and GARCH, which represent particularly difficult cases due to high 
data demands and dependence of critical values on underlying parameters, we demon- 
strate the use and feasibility of the proposed procedure: on the one hand, the adaptive 
procedure, which itself depends on a number of auxiliary parameters, is shown to be rather 
insensitive to their choice, and on the other hand, it facilitates the global selection of these 
parameters by means of fit or forecasting criteria. The real-data applications highlight 
the flexibility of the proposed time-inhomogeneous models since even simple varying- 
coefficients models such as constant volatility and ARCH(l) can outperform standard 
parametric methods such as GARCH(1,1). Finally, the relatively small differences among 
the adaptive estimates based on different parametric approximations indicate that, in the 
context of adaptive pointwise estimation, it is sufficient to concentrate on simpler and less 
data-intensive models such as ARCH(p), < p < 3 , to achieve good forecasts. 

A Proofs 

Proof of Corollary \2. 6 A Given the choice of i a , it directly follows from (12. 5p . □ 

Proof of Theorem \3.1[ Consider the event 'Bk = {I = Ik~i} for some k < K. This 
particularly means that Ik-i is accepted while = [T — + 1,T] is rejected; that is, 
there is I' = [t',T] C I k and r G T( It) such that T/ fc)T > 3^ = li k ,7{i k ) ■ For every fixed 
t e 7{I k ) and J = I k \ [r + 1, T] , J c = [r + 1, T] , it holds by definition of T h>T that 

T Ik , T < Lj(0j) + Ljc(6jc) - Lj{0 o ) = Lj(0j, O ) + Ljc(0 JC , O ). 
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This implies by Theorem 12 . II that Pg (Ti k)T > 23) < exp{e(A, 6 ) — A3}. Now, 

T-m T—mo+1 2 

Peoi'Bk) < Yl Yl 2exp{e(A,6> )-A3 fc /2} <2^exp{e(A,6> )-A3 fc /2}. 

t'=T-m k +l r=t'+l 

Next, by the Cauchy-Schwartz inequality 

K K 

E eo I L Ik (6 Ik ,0)\ r = Y E °o [ I L Ik (Qik j Ok-\) T 1 (Sfe)] < y^fr) 2 1 -^/k (^/k > ^fc-O 1 " -^ef 
fc=i fe=i 

Under the conditions of Theorem 12.11 it follows similarly to (12.61) that 

E eo \L lK (e lK ,e k ^)\ 2r < (m K /m k ^) 2r ^ r {0 Q ) 

for some constant ^ r (0 o ) and k = 1, . . . , K , and therefore, 

if 

S 0[ jL /A .(^ K ,0)| r < [^ r (0 o )] 1/2 ^m fc (m^/m fc _O r exp{e(A,^o)/2- A3 fc /4} 

k=l 

and the result follows by simple algebra provided that aiA/4 > 1 and a 2 A/4 > 2 . □ 



Proof of Theorem Jf..l The proof is based on the following general result. 



Lemma A.l. Let P and Pq be two measures such that the Kullback-leibler divergence 
E\og(dP/dP ) , satisfies Elog(dP/dP ) < A < 00. Then for any random variable ( 
with E ( < 00 , it holds that E log(l + () < A + E (. 

Proof. By simple algebra one can check that for any fixed y the maximum of the function 
f(x) = xy—xlogx+x is attained at x = e y leading to the inequality xy < xlogx—x+e y . 
Using this inequality and the representation i£log(l + () = E {Z log(l + ()} with Z = 
dP/dP we obtain 

£log(l + C) = E {Zlog{l + ()}<E {ZlogZ-Z)+E {l + () 
= E (Z\ogZ) + E (-E Z + l. 

It remains to note that EqZ = 1 and Eq(Z log Z) = EXogZ . □ 

This lemma applied with ( = g(6 1 0)/Egg(0,6) yields the result of the theorem in 
view of 

p[Y t ,g(X t )} 



E e {Z I , e \ogZ I>9 ) = ElogZ It6 = Ej2 l og 



tei 



p[Y t ,g(X t (0))} 



□ 
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Proof of Corollary \4-S\ It is Theorem 14. II formulated for g(0' , 6) = Li(0', 6) . □ 

Proof of Theorem \4-3\ The first inequality follows from Corollary 14.21 the second one from 
condition ( 13. 4ft and the property x > log x for x > . □ 



Proof of Theorem\4-4\ Let k = k > k* . This means that I k is not rejected as homoge- 



nous. Next, we show that for every k > k* the inequality Tj h>T < T Ik ^ Ik ) < % k with 
T = T - m k * = T - |/ fe *| implies L Ik ,(0 Ik ,,0 Ik ) < % k * . Indeed with J = I k \h* , this 
means that, by construction, } k < % k * for k > k* and 

Ik > T Ik)T = Lj k ,(0i^,0i k ) + Lj(0 J} Ik ) > L Ik ,(0 Ik .,Oi k ). 
It remains to note that 

\L Ik ,(0 Ikt ,0)\ r < \L Ik ,(0 Ik ,,0 Ikt )\ r l(k<k*)+f k ,l(k>k*), 
which obviously yields the assertion. □ 
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